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Abstract: We consider a task of scheduling a crawler to retrieve content from several sites 

with ephemeral content. A user typically loses interest in ephemeral content, like news or posts at 
social network groups, after several days or hours. Thus, development of timely crawling policy 
for such ephemeral information sources is very important. We first formulate this problem as an 
optimal control problem with average reward. The reward can be measured in the number of 
clicks or relevant search requests. The problem in its initial formulation suffers from the curse of 
dimensionality and quickly becomes intractable even with moderate number of information sources. 
Fortunately, this problem admits a Whittle index, which leads to problem decomposition and to a 
very simple and efficient crawling policy. We derive the Whittle index and provide its theoretical 
justification. 
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Index de Whittle pour Crawling du Contenu Ephemere 

Resume : Nous considerons une tache de la planification du parcours d’un robot pour 

recuperer le contenu ephemere de plusieurs sites web. Typiquement, un utilisateur de web perd 
interet pour le contenu ephemere, comme des nouvelles ou des posts aux reseaux sociaux, apres 
plusieurs jours ou meme heures. Done, le developpement de la planification dynamique du par¬ 
cours de ces sources d’information ephemeres est tres important. Nous formulons d’abord ce 
probleme comme un probleme de commande optimale avec une recompense moyenne. La re¬ 
compense pent etre mesuree par le nombre de dies ou par le nombre de demandes de recherche 
pertinente. Le probleme dans sa formulation initiate souffre de la “malediction de la dimension” et 
devient rapidement inextricable meme avec nombre modere des sources d’information. Heureuse- 
ment, ce probleme admet un Index de Whittle, qui conduit a la decomposition du probleme et a 
une politique de parcours tres simple et efficace. Nous derivons I’lndex de Whittle et fournissons 
sa justification theorique. 

Mots-cles : Index de Whittle, Moteur de Recherche, Crawler, Robot, Contenu Ephemere 
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1 Introduction 

Nowadays an overwhelming majority of people find new information on the web at news sites, 
blogs, forums and social networking groups. Moreover, most information consumed is ephemeral 
in nature, that is, people tend to lose their interest in the content in several days or hours. The 
interest in a content can be measured in terms of clicks or number of relevant search requests. 
It has been demonstrated that the interest decreases exponentially over time unmanH]. 

In a series of works (see e.g., [TJ [HJ HI HI] and references therein) the authors address the 
problem of refreshing documents in a database. However, these works do not consider the 
ephemeral nature of the information. Motivated by this challenge, the authors of [15] suggest a 
procedure for optimal crawling of ephemeral content. Specifically, the authors of m formulate an 
optimization problem for finding optimal frequencies of crawling for various information sources. 

The approach presented in m is static, in the sense that the distribution of crawling effort 
among the content sources is always the same independent of the time epoch and, in particular, 
does not depend on any ‘state variable(s)’ evolving with time. With a dynamic policy, for 
instance, if there is not much new material on the principal information sources, the crawler 
could spend some time to crawl the sources with less popular content but which nevertheless 
bring noticeable rate of clicks or increase information diversity. Therefore, in the present work we 
suggest a dynamic formulation of the problem as an optimal control problem with average reward. 
The direct application of dynamic programming quickly becomes intractable even with moderate 
number of information sources, due to the so-called curse of dimensionality. Fortunately, the 
problem admits a Whittle index, which leads to problem decomposition and to a very simple 
and efficient crawling policy. We derive the Whittle index and provide its theoretical justification. 

In (5] 1161125] the authors study the interaction between the crawler and the indexing engine 
by means of optimization and control theoretic approaches. One of interesting future research 
directions is to take into account the indexing engine dynamics in the present context. 

The general concept of the Whittle index was introduced by P. Whittle in |2S] . This has been 
a very successful heuristic for restless bandits, which, while suboptimal in general, is provably 
optimal in an asymptotic sense UgHZi and has good empirical performance. It and its variants 
have been used extensively in logistical and engineering applications, some recent instances of the 
latter in communications and control being for sensor scheduling cni, multi-UAV coordination 
|20| . congestion control nil US], channel allocation in wireless networks m, cognitive radio 
m and real-time wireless multicast [53]. Book length treatments of indexable restless bandits 
appear in naHn. 


2 Model 

There are N sources of ephemeral content. A content at source i G {!,..., N} is published with 
an initial utility modelled by a nonnegative random variable and decreasing exponentially over 
time with a deterministic rate jii. The new content arrives at source i € {!,..., A^} according 
to a time-homogeneous Poisson process with rate A^. Thus, if source i’s content is crawled t 
time units after its creation, its utility is given by exp(—p,iT). The base utility is assumed 
independent identically distributed across contents at a given source, with a finite mean It 
is also assumed independent across sources. We assume that the crawler crawls periodically at 
multiples of time T > 0 and has to choose at each such instant which sources to crawl, subject 
to a constraint we shall soon specify. When the crawler crawls a content source, we assume that 
the crawling is done in an exhaustive manner. In such a case, the crawler obtains the following 
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expected reward from crawling the content of source v. 

u^ = A^iE[^iexp{-fiiT)] = ^^(1 - exp(-/rjT)). (1) 

Set ai = exp(—/iiT). Let us define the state of source i at time t as the total expected utility of 
its content, denoted by Xi{t). Then, if we do not crawl source i at epoch t (formally, the control 
is Vi{t) = 0 - we say the source is ‘passive’), we obtain zero reward ri{Xi{t), Vi{t)) = 0 and the 
state evolves as follows: 

Xi(t + 1) = ctiXiit) + Ui- (2) 

On the other hand, if we crawl source i (formally, Vi{t) = 1 - we say the source is ‘active’), we 
obtain the expected reward ri{Xi{t),Vi{t)) = Xi{t) and the next state of the source is given by 

X,{t + l) = u,. (3) 


Our aim is to maximize the long run average reward 

N ^ t 

lim sup ^ ^ r (X, (t), u, (t)) (4) 

t'toO -1 ^ ri 

I i—l m —0 

subiect to the constraint 

N t 

lim sup E 7 E CMt)=M (5) 

ttoo -1 ^ _ n 

' 1—1 m—O 

for a prescribed M > 0. If Ci = l,i = this case can be interpreted as a constraint 

on the number of crawled sites per crawling period T and corresponds to the original Whittle 
framework for restless bandits [3H]. The case Ci 7 ^ 1 is slightly more general and can represent 
the situation when various sites have different limits on the crawling rates (typically specified in 
the file ‘robots.txt’). 

This is a constrained average reward control problem [TJ [22] • We address this problem in the 
framework of restless bandits and derive a simple index policy for the problem, which may be 
viewed as a variant of the celebrated Whittle index. In the next section, we recall the theory of 
Whittle index. 


3 Whittle index 

The original formulation of restless bandits is for discrete state space Markov chains, but we 
consider here Markov chains with closed domains (i.e., closure of an open set) Si C 7?.“*, d > 1, as 
state space. The original motivation for the index policy remains valid nevertheless as long as we 
justify the associated dynamic programming equation, which we do. A deterministic dynamics 
such as ours is a special case, albeit degenerate. The fully stochastic case can be handled simi¬ 
larly and is detailed in the report |2] . While we introduce the broader framework in a general set 
up, we use the same notation as above to highlight the correspondences. This should not cause 
any confusion. 

Thus consider resp. S'i-valued processes Xi{t),t > 0,1 < i < N, each with two possi¬ 
ble dynamics, dubbed active and passive, wherein they are governed by transition kernels 
Pi{dy\x), qi{dy\x) resp. These are assumed to be continous as maps x G S'i 'P{Si). (:= 
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the space of probability measures on Si with Prohorov topology). The control at time t is 
an A := {0,1}'^-valued vector v{t) = ,ujv(t)] G A, with the understanding that 

Vi(t) = 1 Xi{t) is active. In the original restless bandit problem, exactly N' < N pro¬ 
cesses are active at any given time. The Vi{t) are assumed to be adapted to the history, i.e., the 
(T-field a{Xi{s), s < t; Vi{s), s < t; 1 < i < N). Let ri : S i-G TiA, 1 < i < TV, be reward functions 
so that a reward of ri{Xi{t)) is accrued if process i is active at time t. The objective then is to 
maximize the long run average reward 


1 

lim sup EE E[ri{Xi{t))vi{t)\. 

i=l * m=0 

This problem has state space Whittle’s heuristic among other things reduces the 

problem to separate control problems on Si. The idea is to relax the constraint of ‘exactly N' 
are active’ to ‘on the average, N' are active’, i.e., to 


t N 

limsup - ^ ■ 




s=0 


2 = 1 


This makes it a constrained average reward control problem jU 122] which permits a relaxation 
to an unconstrained average reward problem by replacing the above reward by 


N t 

limsup^ - ^ E[ri{Xi{s))v,{s) + X{N'/N - 'Ci(s))], 

i=l * s=0 


where X G TZ is the Lagrange multiplier. Motivated by this. Whittle introduced a ‘subsidy’ A 
for passivity, i.e., a virtual reward for a process in passive mode. Replace the above control 
problem by N control problems with the ith problem for process Xi{-) seeking to maximize over 
admissible Vi{t),t > 0, the reward 


limsup - ^ E[ri{X^{t))vi{s) + X{N'/N - Ui(s))]. 




s=0 


The dynamic programming equation for this average reward problem is 


( 6 ) 


V,{x)+(3 = 


max(^A-k J qi{dy\x)Vi{y), ri{x) + J p^{dy\x)Vi{y)y (7) 

If this can be rigorously justified (which is not always easy), one defines B{X) as the set of passive 
states, i.e., 

BiX) := 

ja: : A-k J qi{dy\x)Vi{y) > ri{x) + J pi{dy\x)Vi{y)^ . 

If B{X) increases monotonically from (j) to Si as A increases from —oo to oo, the problem is said 
to be Whittle indexable. The Whittle index for the ith process in state Xi is then defined as 
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{A':A' + J qi{dy\xi)V(y) = nixi) + J Pi{dy\xi)V{y)}. 

The so-called ‘ Whittle index policy’ j2S| then is to set Vi{t) = 1 for the i with the top N' indices 
and Vj{t) = 0 for the rest. 


4 Dynamic programming equation 

In view of the above, the first step is to justify the counterpart of Q in our context. For this, 
we first note that ri{x) = x,l < i < N. Further, let u* := > Ui. We argue that without 

loss of generality, we may take Si = [ui,u*]. To see this, let W(0) = Xq. If Xo < u*, it is easy to 
see that 

Xi{t) < a\xo + {I - a\)u* f 

where the equality in the first inequality occurs only if source i is never crawled. On the other 
hand, if Xq > u*, then 

Xi{t) = ajxo + (1 — <^i)ui iu* as 11 00 , 

if never crawled, and reduces to the previous case if there is even a single crawl. Combining the 
two observations and recalling that we consider the long-run average criterion, we conclude that 
Xq ^ are transient and can be ignored. Thus we set Si = [ui,u*]. 


Henceforth we focus on the average reward problem for source i. We do not delve into 
the justification for Lagrange multiplier formulation for constrained average cost problem on 
a general state space, as this is well understood. (In fact, it follows from standard Lagrange 
multiplier theory applied to the ‘occupation measure’ formulation of average cost problem which 
casts it as an abstract linear program. See section 4.2 of [B] which carries out this program for 
discrete state space and section 3.2 of ibid, which describes how to extend the same to general 
compact Polish state spaces as long as the controlled transition probability kernel is continuous 
in the initial state and control.) For notational simplicity we drop the index i for the time being. 
We approach the problem by the standard ‘vanishing discount’ argument. Thus let 0 < i5 < 1 
be a discount factor and for k{x,v) := xv + C'A(1 — v), consider the infinite horizon discounted 
reward 

OO 

m—0 

Denote the associated value function by 


Vs{x) := 


sup 


Y,S^kiX{t),vit)) 

_m—0 


Then Vs satisfies the discounted reward dynamic programming equation 


Vs{x) = max (CA -I- 6Vs{ax -I- u), x + SVs{u)). 


( 8 ) 
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Lemma 1 The solution of equation (|^ has the following properties: 

(1) Equation (|^ has a unique bounded continuous solution Vs] 

(2) Vs is Lipschitz uniformly in 5 € (0,1); 

(3) Vs is monotone increasing and convex. 

Proof: Claim (1) is standard (See Theorem 4.2.3 and bullet 1 in ‘Notes on §4.2’, Section 4.2, 
|11|1. For (2), consider x ^ x' > x G S. Consider processes X{t),t > 0, and X'{t),t > 0, with 
initial conditions x,x' resp., both controlled by control sequence v{t),t > 0, that is optimal for 
the former. Then 


Vsix') - Vs{x) < 


Y^5^{k{X'{t)Mt))-k{X{t),vm 


(^) 


(x' 


x). 


where r := the time of first crawl (= oo if never crawled). Interchanging the roles of x', x we get 
a symmetric inequality, whence it follows that 


\Vs{x')-Vs{x)\ < 


(^) 


X . 


For the first part of (3), take x' > x as above and let X'{t), X{t),t > 0, be processes generated 
by a common admissible control sequence {u(t)} with initial conditions x',x resp. Then it is 
easy to check that X'{t) > X{t) for all t. Therefore 


Y,6^k{X'{t),vit)) >Y,6^k{X'{t),vit)). (9) 

t=0 t=0 

Taking supremum over all admissible controls on both sides, monotonicity of Vs follows. For 
convexity, define the finite horizon discounted value function 

n 

Vn{x) = sup y^d*k{X{t),v{t)). 

{«(t)},X(0)=a: 

Then it satisfies the dynamic programming equation 


Vn{x) = max(CA + 5Vn-i{ax + u), x + SVn-i(u)) 

for u > 1 with Vo(x) = x. The convexity of for each n then follows by a simple induction. 
Since Vs{x) = lim„^oo Vn(x), Vs is also convex. □ 


Define Vs{x) = Vs{x) — Vs{u), x G S. Then by the above lemma. Vs is bounded Lipschitz, 
monotone and convex with Vs{u) = 0. Also, (1 — 6)Vs{u) is bounded. Using Arzela-Ascoli and 
Bolzano-Weirstrass theorems, we may pick a subsequence such that (Vs, (1 — 6)Vs{u)) converge 
in C{S) xTZ to (say) (U, /3). From Q, we have 

Vsix) -I- (1 — 6)Vs{u) = max (CA -I- SVsiax + u), x) . 
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Passing to the limit along an appropriate subsequence as 5 f !> we have 


V{x) + 13 


max (CA + P(ax + m), x) 

max (ua: + (1 — n)(A + Pfaa; + m)) 
uG{0,l} V 


( 10 ) 

( 11 ) 


Then (10) is the desired dynamic programming equation for average reward. We study important 
structural properties of the value function V in the next section. 


5 Properties of the value function 

We begin with the following result. 

Lemma 2 The following statements hold: 

(1) V is monotone increasing and convex with V{u) = 0; 


(2) The maximizer on the right hand side of (111 is the optimal control choice at state x and f3 
is the optimal reward. 

Proof: Since monotonicity and convexity are preserved in pointwise limits, the first claim is 
immediate. For the second, let v*{x) denote the maximizer on the r.h.s. of (11), any tie being 
settled arbitrarily. Then under {v(t) = v*{X{t)), t > 0}, 


V{X{t)) + /3 = k{X{t), vit)) + V{X{t + 1)). 


( 12 ) 


Summing (121 over t = 1,2, •• • ,T, and dividing by T on both sides, then letting T f oo, we 
see that (3 = the average reward under this control policy. On the other hand, for any other 
control sequence, the equality in (121 will be replaced by >, leading to the conclusion that (3 > 
the corresponding average reward by an argument similar to the above. This imples the second 
claim. □ 


Now define 


B := {x € S : CX + V{ax + u) > a:}, 

B^^ := {x € S : CX + V{ax + u) < a:}. 

These are respectively the sets of passive and active states under subsidy A. 


Recall the stopping time r := the time of first crawl. Suppose r < oo. (The case r = oo 
corresponds to ^never crawl’ which we consider separately below.) Under optimal policy, iterating 
equation (101 r times leads to 


V{x) = {CX-(3 )t + 



/1-a^ 
\ 1 — a 


u — 13 
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Under any other policy, we would likewise obtain 


V{x) > (CA - P)t + 


a X -\- 


l-W 
1 — a 


u — (3 


Thus we have the explicit representation for V given by 


V (x) = max 


(CA-/3)t + 


a X + 


1 - W 
1 — a 


u — P 


where the maximum is over all admissible control sequences. In particular, this implies: 


Lemma 3 Equation (101 has a unique solution. 

Finally, we have the key lemma: 

Lemma 4 The above problem is Whittle indexable. 

Proof: Since V is monotone increasing and convex, the map 

X X — V{ax + u) 

is concave and hence the set B increases monotonically from ^ to S' as A increases from —oo to 
oo. The claim now follows from the definition of Whittle indexability. □ 


We shall now eliminate some irrelevant situations. 


1. If u* G B, i.e., the optimal action at u* is 0, then u* is a fixed point of the optimally 
controlled dynamics and the corresponding cost is CA. Then P = CA and it is optimal to 
be passive at all states, i.e., B = B^ = p, and 

A > Am := max {x— V{ax+ u))/C. (13) 

x^[u,u*] 


2. If u G B‘^, then from (101, 0 + /3 = u + 0, i.e., P = u and it is optimal to crawl when at 
u. Then it is a fixed point of the controlled dynamics and it is optimal to be active at all 


states, i.e., B° = [ii,u*], B = p, and 


A < Am := min {x — V{ax + u))/C. 


(14) 


Note that since constant policies v{t) = 0 and v{t) = 1 lead to costs CA and u resp., P > {CX)\/u 
always and P > (CA) V it for A G (Am, Am)- For each A in (Am, Am), both B, B'^ are non-empty 
and there exists an a G (it, it*) for which the choice of being active or passive is equally desirable. 
Furthermore, this a is an increasing function of A by Lemma 4. Inverting this function, we have 
jix) := the value of A at which the active and passive become equally desirable choices, as an 
increasing function of a: G (it, it*). 

Lemma 5 The sets B, B^ are of the form [it, a), [a, it*] for some a G [it, it*]. 

Proof: Since V is convex, one of the following two must hold: 
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1. For some 02 > ai, B = [m, ai) H ( 02 , m*] and B'^ = [ 01 , 02 ], or, 

2. for some a, B = [u,a),B^ = [o,u*]. 

However, since at u* the optimal action is to crawl, we conclude that u* G and only the 
second possibility can occur. □ 

Corollary 1 The map x >—>■ x — V(ax + u) is monotone non-decreasing on [m, u*]. 


6 Derivation of Whittle index 


Consider the situation when A = 7 ( 0 ;) for a prescribed x £ {u,u*). It is clear that after the 
first crawl when the process is reset to u, the optimal X{t) becomes periodic: not crawling and 
increasing till it hits and then crawling - thereby being reset to rt - to repeat the process. 
Since finite initial patches do not affect the long run average reward, we may then take X(0) = u. 
Define ri{x) = min{t : X{t) £ B‘^}. Then 


X{r]{x)) = {I — 

=> r]{x) = log+ (1 - 


(15) 

(16) 


where log)!) x = log^^ xl{x > 0}. Since the long run average cost is equal to the average over one 
period, we can write 

^ CX{rj{x)-l)+Xi7^{x)) 


ri{x) 


(17) 


where rj{x) is given by (16 1 and X{rj{x)) is given by (15l. 


We now revert to using the index i to identify the source being referred to. In particular, 
Pi, Ai will refer to the optimal reward, resp. Lagrange multiplier, for the ith decoupled problem. 
Our main result is: 


Theorem 1 The Whittle index for our problem is given by 


7i(x) ■■= ^ 


r^i(x)((l - ai)x - Ut) -b 



where 


Vi(x) 




Therefore the index policy is to crawl at time t (= mT for some to > 0) the top M sources 
according to decreasing values of 7 i(W(<)), or alternatively, choose a number of top sources for 
the constraint to be reached. 


Remark: Note that if an arm (say, ith) is crawled even once, the corresponding state process 
{Xi{t)} takes only discrete values thereafter. These depend on ai and Ui alone. In fact this is 
also true for an arm that is never crawled, except that the discrete values taken will also depend 
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on the initial condition. Therefore we need restrict attention to only these values of x for the 
argument of 7 i(-). This results in a further simplification of the index formula, to 

li{^) = ^ - Ui) + X) , 

where rji{x) is as before, but the argument x of both 7 ^ and rji is now restricted to the aforemen¬ 
tioned discrete set. 


Proof: We drop the subscript i for notational convenience. For x S B'^, (10) leads to V{x) = 
X — p. Also, for x' := ax + u, 


X < u 

— 

\ — a 

=7- x' 

= 

ax + u 


> 

ax + (1 — a)x 

=> x' 

> 

X 

=> x' 

e 

(by Lemma 5) 

> V{x') 

= 

x' — p. 


Combining this with (10 1 and the definition of Whittle index implies that for our problem it is 

(1 - ai)x -Ui + Pi{x) 


= 


C, 


(18) 


where by virtue of (17), Pi{x) := the optimal cost if one were to set = ')i{x). The latter is 
given by: 

j3i{x) := 


where 


Viix) 

rj^x) := 


[Ca,{x){r]i{x) - 1) -f (1 - u-1. 

+ /ui - (1 - ai)x'' 


logo 


Ui 


Substituting this back into (18 1 , one gets a linear equation for 7 i(a;) that can be solved to evaluate 
jPx) as 




r]i{x){{l - ai)x - Uj) -I- 


1 - a. 


rii(x) ' 


1 - 


Ui 


This completes the proof. 


□ 


7 Stochastic case 

We now consider the fully stochastic situation when traffic at each source is observed as a random 
variable. In fact one could also consider mixed situations when some sources are observed and 
others are not. As we shall see, the development closely mimics the foregoing and the Whittle 
index is actually the same. 

The stochastic system dynamics can be described as follows: Let {rf} denote the successive 
arrival times of content at source i, with utilities {^(,}, resp. The net utility added to source i 
during /c-th epoch will be 

U^{k) := Y. 

: (fc-l)T<T^<fcT 
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The system state at time (k + 1)T is then 

Xi{k-\-l) = aiXi{k)-\-Ui{k + l) if no crawl, 

= C/j(fc + l) if crawled. (19) 

We define the average reward as 


N ^ t 

limsup^ - ^ E[r{Xi{t),v^{t))], 


i=l ^ m=0 

which we seek to maximize subject to the constraint 

1 ^ 


limsup - ^ CiE[vi{t)\ = M. 


t'l'OO 


2=1 


The discounted value function 


Vs{x) := sup E 

{v{t)},X(0)^x 

then satisfies the dynamic programming equation 


^S^k{X{t),vit)) 


,i=0 


Va(cc) = max ^CA + (5 J Vs{ax + u)(pi{du), x + sj Vs{u)(pi{du)^ , 


where ipi is the law of Ui{t) Vt. 

Lemma 5 The conclusions of Lemma 1 continue to hold. 


( 20 ) 


Proof: The first claim follows as before from the cited results of HU. For the second, let 
X{t),X'{t) be as in the proof of Lemma 1 (2). Then 


Vs{x') - Vs{x) < 
< 


E 


J26^ikix'it)Mt))-HXit),vm 

.t=0 


E[{a5Y] {x' -x). 


The Lipschitz property follows as before. Next let X{t),X'{t) be as in the proof of Lemma 
1 (3). Taking expectations in followed by a supremum over all admissible controls proves 
monotonicity. Convexity follows as in the deterministic case. □ 


The ‘vanishing discount’ argument of Section 4 can now be used to establish the average cost 
dynamic programming equation 

C(a:) +/3 = max(C'A + J V{ax + u)<f{du), x). (21) 

Monotonicity and convexity of V follows as in Lemma 2. Equation (Hi gets modified to 
E[V{X{t))] + /? = E[k{X{t),v{t))] + E[V{X{t + 1))], 
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from which the optimality of 


e Argmax^ + (1 — z;)(A + J V{ax + u)ip{du))^ , x G S, 

follows by arguments analogous to those of Lemma 2. Furthermore, V can be shown to be the 
unique solution of (211 by establishing the explicit representation 

r 

(CA - I3)t + Wx + Y^ W-*U{t) - , 


V (x) = max E 


t=o 


where the maximum is over all admissible control sequences. Thus, Whittle indexability follows 
as before. Define 


ri{x) 



A(0) = cc 




S(a;) := E 

The definitions of B,B^ change to 

B := {x G S : CX + j V{ax + u)ip{du) > x}, 

B° := {x G S : CX + J V{ax + u)ip{du) < x}. 

Let T]m,rn > 1, denote the successive visits to B‘^, i.e., the crawling times. Then 

rim+l-l 

X{r]m+l) = TO > 1. 

t=rim 


As before, we may assume that r]i{x) = 0. Then the expression (161 for r] 2 {x) will continue to 

as before 

Pix) = 


hold. We denote it by ri{x) as before for notational convenience. Therefore 

CX{rj{x)-l)+E[X{p{x))] 


V{x) 

CX{'q{x) — 1) + (1 — 

T](x) 

as before. Hence the conclusions of Theorem 1 continue to hold. 


8 Numerical examples 

Let us illustrate the obtained theoretical results by numerical examples. There are four informa¬ 
tion sources with parameters given in Table Without loss of generality, we take the crawling 
period T = 1. One can see how the user interest decreases over time for each source in Figure 
The initial interest in the content of sources 1 and 2 is high, whereas the initial interest in the 
content of sources 3 and 4 is relatively small. The interest in the content of sources 1 and 3 
decreases faster than the interest in the content of sources 2 and 4. 

In Figure we show the state evolution of the bandits (information sources) under the 
constraint that on average the crawler can visit only one site per crawling period T, i.e., M = 1. 
The application of Whittle index results in periodic crawling of sources 1 and 2, crawling each 


RR n° 8702 












14 


K.E. Avrachenkov & V.S. Borkar 


with period two. Sources 3 and 4 should be never crawled. Note that if one greedily crawls 
only source 1, he obtains the average reward 179.79. In contrast, the index policy involving two 
sources results in the average reward 254.66. 

In Figure we show the state evolution of the bandits under the constraint that on average 
the crawler can visit two information sources per crawling period, i.e., M = 2. It is interesting 
that now the policy becomes much less regular. Source 1 is always crawled. Sources 2 and 3 
are crawled in a non-trivial periodic way and sources 4 is crawled periodically with a rather long 
period. Now in Figure]^ we present the state evolution of the stochastic model with dynamics 
(19). As one can see, in the stochastic setting source 1 is crawled from time to time. 


Table 1: Data for numerical example 


i 

1 

2 

3 

4 

6 

1.0 

0.7 

0.2 

0.08 

Mi 

0.7 

0.35 

0.7 

0.21 

A. 

250 

250 

250 

250 



Figure 1: Content value as a function of time. 



Figure 2: The case of M = 1. 


9 Conclusions and future works 

We have formulated the problem of crawling web sites with ephemeral content as an average 
reward optimal control problem and have shown that it is indexable. We have found that the 
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□ i=3 

<> i=4 
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0 0 


Figure 3: The case of M = 2. 



Figure 4: The case of M = 2 (stochastic model). 


Whittle index has a very simple form, which is important for efficient practical implementations. 
The numerical example demonstrates that the Whittle index policies, unlike the policies sug¬ 
gested in m, do not generally have a trivial periodic structure. The proposed approach can 
also be used in the cases when some states are observed. In such cases, the Whittle index will 
act as a self-tuning mechanism. We are currently working on the adaptive version when some 
parameters (e.g., the rate of new information arrival) need to be estimated online. One more 
interesting future research direction is to add to the model the dynamics of the indexing engine. 
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